A Global Data Category Registry for Interoperable Language Resources
نویسنده
چکیده
ISO TC 37 is creating a Data Category Registry (DCR) as an online open-source RDF-based resource for use by implementers of electronic language resources, including terminologies, presentational and non-presentational lexical resources, NLP lexica, etc. The DCR will allow dynamic generation of data category selections (DCSs), e.g., subsets of the collection reflecting various thematic domains and different data category classes and functions. The DCR will facilitate interchange and interoperability in heterogeneous environments. Participation of a wide range of experts from the broader computing community is important, as is provision for userfriendly guidance for implementers of databases and other resources. Data Categories for Language Resources
منابع مشابه
Standardizing a Component Metadata Infrastructure
This paper describes the status of the standardization efforts of a Component Metadata approach for describing Language Resources with metadata. Different linguistic and Language & Technology communities as CLARIN, META-SHARE and NaLiDa use this component approach and see its standardization of as a matter for cooperation that has the possibility to create a large interoperable domain of joint ...
متن کاملAn API for accessing the Data Category Registry
Central Ontologies are increasingly important to manage interoperability between different types of language resources. This was the reason for ISO to set up a new committee ISO TC37/SC4 taking care of language resource management issues. Central to the work of this committee is the definition of a framework for a central registry of data categories that are important in the domain of language ...
متن کاملFoundation of a Component-based Flexible Registry for Language Resources and Technology
Within the CLARIN e-science infrastructure project it is foreseen to develop a component-based registry for metadata for Language Resources and Language Technology. With this registry it is hoped to overcome the problems of the current available systems with respect to inflexible fixed schema, unsuitable terminology and interoperability problems. The registry will address interoperability needs...
متن کاملISOcat: Corralling Data Categories in the Wild
To achieve true interoperability for valuable linguistic resources different levels of variation need to be addressed. ISO Technical Committee 37, Terminology and other language and content resources, is developing a Data Category Registry. This registry will provide a reusable set of data categories. A new implementation, dubbed ISOcat, of the registry is currently under construction. This pap...
متن کاملMetadata Profile in the ISO Data Category Registry
Metadata descriptions of language resources become an increasing necessity since the shear amount of language resources is increasing rapidly and especially since we are now creating infrastuctures to access these resources via the web through integrated domains of language resource archives. Yet, the metadata frameworks offered for the domain of language resources (IMDI and OLAC), although mat...
متن کامل